Context-Adaptive-based Image Captioning by Bi-CARU

نویسندگان

چکیده

Image captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage encoder-decoder neural networks, can provide a rational structure for tasks such as image coding caption prediction. This work introduces Convolutional Neural Network (CNN) Bidirectional Content-Adaptive Recurrent Unit (Bi-CARU) (CNN-to-Bi-CARU) model that performs bidirectional consider contextual features captures major feature from image. The encoded coded form is respectively passed into forward backward layer CARU refine word prediction, providing output captioning. An attention also introduced collect produced by context-adaptive gate in CARU, aiming compute weighting relationship extraction determination. In experiments, proposed CNN-to-Bi-CARU outperforms other advanced models field, achieving detailed representation captions. obtains score 41.28 on BLEU@4, 31.23 METEOR, 61.07 ROUGE-L, 133.20 CIDEr-D, making it competitive captioning MSCOCO dataset.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase-based Image Captioning

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representat...

متن کامل

Improving Image Captioning by Concept-Based Sentence Reranking

This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google’s CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of concept-level annotations on Flickr. Different from previous usage of concept detection that is tailored to specific image captioning models, the propose approac...

متن کامل

Context-based adaptive image resolution upconversion

bstract. We propose a practical context-based adaptive image esolution upconversion algorithm. The basic idea is to use a lowesolution (LR) image patch as a context in which the missing highesolution (HR) pixels are estimated. The context is quantized into lasses and for each class an adaptive linear filter is designed using training set. The training set incorporates the prior knowledge of he ...

متن کامل

Unpaired Image Captioning by Language Pivoting

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this ...

متن کامل

Domain-Specific Image Captioning

We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new q...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3302512